ATOL: A Framework for Automated Analysis and Categorization of the Darkweb Ecosystem

نویسندگان

  • Shalini Ghosh
  • Phillip Porras
  • Vinod Yegneswaran
  • Ken Nitz
  • Ariyam Das
چکیده

We present a framework for automated analysis and categorization of .onion websites in the darkweb to facilitate analyst situational awareness of new content that emerges from this dynamic landscape. Over the last two years, our team has developed a large-scale darkweb crawling infrastructure called OnionCrawler that acquires new onion domains on a daily basis, and crawls and indexes millions of pages from these new and previously known .onion sites. It stores this data into a research repository designed to help better understand Tor’s hidden service ecosystem. The analysis component of our framework is called Automated Tool for Onion Labeling (ATOL), which introduces a two-stage thematic labeling strategy: (1) it learns descriptive and discriminative keywords for different categories, and (2) uses these terms to map onion site content to a set of thematic labels. We also present empirical results of ATOL and our ongoing experimentation with it, as we have gained experience applying it to the entirety of our darkweb repository, now over 70 million indexed pages. We find that ATOL can perform site-level thematic label assignment more accurately than keywordbased schemes developed by domain experts — we expand the analyst-provided keywords using an automatic keyword discovery algorithm, and get 12% gain in accuracy by using a machine learning classification model. We also show how ATOL can discover categories on previously unlabeled onions and discuss applications of ATOL in supporting various analyses and investigations of the darkweb.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

I-1: Screening of Subfertile Men for Testicularlar Carcinoma In Situ by An Automated Image Analysis-Based Cytological Test of The Ejaculate

Background: Testicular cancer (TC) is usually diagnosed after manifestation of an overt tumour. Tumour formation is preceded by a pre-invasive and asymptomatic stage, carcinoma in situ (CIS) testis, except for very rare subtypes. The CIS cells are located within seminiferous tubules but can be exfoliated and detected in ejaculates with specific CIS markers. Materials and Methods: We have built ...

متن کامل

Designing an Innovative University Model in the Framework of the Innovation Ecosystem in Iran

The purpose of this study was to design an innovative university model in the context of the innovation ecosystem in Iran and to identify the components of the innovation ecosystem and innovative universities. The methodology of the research was grounded theory. The statistical population of this study was all faculty members of the Ministry of Science, Research and Technology, Institute of Res...

متن کامل

Process Capability Studies in an Automated Flexible Assembly Process: A Case Study in an Automotive Industry

Statistical Process Control (SPC) methods can significantly increase organizational efficiency if appropriately used. The primary goal of process capability studies is to obtain critical information about processes to render them even more effective. This paper proposes a comprehensive framework for proper implementation of SPC studies, including the design of the sampling procedure and interva...

متن کامل

Human Capital Content Analysis: General Pattern and Application for Graduates of Persian Literature

Economists use human capital as a black box in their models, regardless of the content. It does not have the necessary importance and effectiveness for policy development of higher learning and employment of higher education graduates. Therefore, this study aimed to reopen this black box and analyze its content theoretically and experimentally. To achieve this goal, first the concept of human c...

متن کامل

Cost Function Modelling for Semi-automated SC, RTG and Automated and Semi-automated RMG Container Yard Operating Systems

This study analyses the concept of cost functions for semi-automated Straddle Carrier (SC), Rubber Tyred Gantry (RTG) and automated Rail Mounted Gantry (RMG) container yard operating cranes. It develops a generic cost based model for a pair-wise comparison, analysis and evaluation of economic efficiency and effectiveness of container yard equipment to be used for decision-making by terminal pla...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017